NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Array Programming on GPUs: Challenges and Opportunities

https://doi.org/10.1145/3736112.3736144

Li, Xinyi; Baranowski, Mark; Dam, Harvey; Gopalakrishnan, Ganesh (June 2025, ACM)

Full Text Available
An SMT Formalization of Mixed-Precision Matrix Multiplication: Modeling Three Generations of Tensor Cores

Valpey, Benjamin; Li, Xinyi; Pai, Sreepathi; Gopalakrishnan, Ganesh (June 2025, NASA Formal Methods)
Titolo, Laura (Ed.)
Many recent computational accelerators provide non-standard (e.g., reduced precision) arithmetic operations to enhance performance for floating-point matrix multiplication. Unfortunately, the properties of these accelerators are not widely understood and lack sufficient descriptions of their behavior. This makes it difficult for tool builders beyond the original vendor to target or simulate the hardware correctly, or for algorithm designers to be confident in their code. To address these gaps, prior studies have probed the behavior of these units with manually crafted tests. Such tests are cumbersome to design, and adapting them as the accelerators evolve requires repeated manual effort. We present a formal model for the tensor cores of NVIDIA’s Volta, Turing, and Ampere GPUs. We identify specific properties—rounding mode, precision, and accumulation order—that drive these cores’ behavior. We formalize these properties and then use the formalization to automatically generate discriminating inputs that illustrate differences among machines. Our results confirm many of the findings of previous tensor core studies, but also identify subtle disagreements. In particular, NVIDIA’s machines do not, as previously reported, use round-to-zero for accumulation, and their 5-term accumulator requires 3 extra carry-out bits for full accuracy. Using our formal model, we analyze two existing algorithms that use half-precision tensor cores to accelerate single-precision multiplication with error correction. Our analysis reveals that the newer algorithm, designed to be more accurate than the first, is actually less accurate for certain inputs.
more » « less
Full Text Available
An SMT Formalization of Mixed-Precision Matrix Multiplication: Modeling Three Generations of Tensor Cores

https://doi.org/10.1007/978-3-031-93706-4_21

Valpey, Benjamin; Li, Xinyi; Pai, Sreepathi; Gopalakrishnan, Ganesh (January 2025, Springer Nature Switzerland)

Full Text Available
Efficient fully Bayesian approach to brain activity mapping with complex-valued fMRI data

https://doi.org/10.1080/02664763.2024.2422392

Wang, Zhengxin; Rowe, Daniel B; Li, Xinyi; Brown, Andrew D (November 2024, Journal of Applied Statistics)

Full Text Available
On large deviations and intersection of random interlacements

https://doi.org/10.3150/23-BEJ1666

Li, Xinyi; Zhuang, Zijie (August 2024, Bernoulli)

Full Text Available
Sharp Asymptotics for Arm Probabilities in Critical Planar Percolation

https://doi.org/10.1007/s00220-024-05028-0

Du, Hang; Gao, Yifan; Li, Xinyi; Zhuang, Zijie (August 2024, Communications in Mathematical Physics)

Full Text Available
Q-Learning Based Methods for Dynamic Treatment Regimes

https://doi.org/10.1007/978-3-031-50690-1_5

Li, Xinyi; Freeman, Nikki_L B; Wang, Lily (June 2024, Springer International Publishing)
Zhao, Yichuan; Chen, Ding-Geng (Ed.)
Full Text Available
A fully Bayesian approach for comprehensive mapping of magnitude and phase brain activation in complex-valued fMRI data

https://doi.org/10.1016/j.mri.2024.03.029

Wang, Zhengxin; Rowe, Daniel B; Li, Xinyi; Brown, Andrew D (June 2024, Magnetic Resonance Imaging)

Full Text Available
Prompt-Based Generative News Recommendation (PGNR): Accuracy and Controllability

Li, Xinyi; Zhang, Yongfeng; Malthouse, Edward C (March 2024, Advances in Information Retrieval (ECIR 2024))

Full Text Available
FTTN: Feature-Targeted Testing for Numerical Properties of NVIDIA & AMD Matrix Accelerators

https://doi.org/10.1109/CCGrid59990.2024.00014

Li, Xinyi; Li, Ang; Fang, Bo; Swirydowicz, Katarzyna; Laguna, Ignacio; Gopalakrishnan, Ganesh (May 2024, IEEE)

Full Text Available

« Prev Next »

Search for: All records